Fix HIP warp synchronization function conflicts for ROCm 7.0+ #15241

slojosic-amd · 2025-08-11T13:04:57Z

Summary

This PR should fix warp synchronization function conflicts that occur when rocWMMA headers are included

Problem

ROCm includes native warp synchronization builtin functions (__shfl_sync, __shfl_xor_sync, etc.) that require 64-bit masks and have been available since ROCm 6.2. However, when rocWMMA headers are included, they pull in these native functions which conflict with the existing compatibility shims that map to the maskless __shfl equivalents.

The conflict occurs because:

Standard HIP headers (like hip_fp16.h) include amd_warp_sync_functions.h
This defines native __shfl_*_sync functions expecting 64-bit masks
Existing code uses 32-bit masks (0xffffffff) causing compilation failures
The native functions override any compatibility macros defined later

ggml/src/ggml-cuda/common.cuh

ggml/src/ggml-cuda/vendors/hip.h

…UDA_WARP_MASK

IMbackK

This makes no sense.
The rocm native __shfl_*_sync functions from amd_warp_sync_functions.h have always used 64bit masks and have always very rightly failed you for passing in a 32bit type.
But we dont use those, we instead #define the __shfl_*_sync functions to the __shfl equivalents, which have no mask.
It makes zero sense why rocwmma would make a difference here other than that the inclusion of its header is somehow causing the shfl functions from amd_warp_sync_functions.h to be used instead.

While switching over to using the __shfl_*_sync from amd_warp_sync_functions.h might look attractive from the surface, it makes little sense for us right now, as they also just call __shfl see https://github.com/ROCm/clr/blob/7f77ceeaf9317261039d8f88b5748cb514a88df9/hipamd/include/hip/amd_detail/amd_warp_sync_functions.h#L298 and where added only in rocm 6.2, while we still need to support at least rocm 6.1 as this is what is in Debian stable.

This pr is going to be a nak from me, please identify how the wrong versions of the shfl functions are being used when built with rocwmma instead.

…C_BUILTINS has been replaced with HIP_DISABLE_WARP_SYNC_BUILTINS (https://github.com/ROCm/clr/blob/rocm-6.4.x-with-7.0-preview/hipamd/include/hip/amd_detail/amd_warp_sync_functions.h#L30)

… 7.0 preview)

slojosic-amd · 2025-08-12T11:00:31Z

Thanks @IMbackK After deeper investigation, it doesn't have anything related to "new" rocWMMA or some "ROCm 7.0 warp mask breaking changes". The real issue here is macro change in https://github.com/ROCm/clr/blob/rocm-6.4.x-with-7.0-preview/hipamd/include/hip/amd_detail/amd_warp_sync_functions.h#L30
HIP_DISABLE_WARP_SYNC_BUILTINS was HIP_ENABLE_WARP_SYNC_BUILTINS in https://github.com/ROCm/clr/blob/rocm-6.3.x/hipamd/include/hip/amd_detail/amd_warp_sync_functions.h#L30 so for ROCm 6.5+ we should #define HIP_DISABLE_WARP_SYNC_BUILTINS instead of #undef HIP_ENABLE_WARP_SYNC_BUILTINS in https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-cuda/fattn-wmma-f16.cu#L18 to prevent conflict with native built in warp sync functions after including rocwmma.hpp Please check: 74c7a2f

IMbackK · 2025-08-12T17:19:58Z

Yes the change in the behavior in amd_warp_sync_functions.h indeed is the cause of this issue, but this solution is still overly complicated, we dont need to handle separate cases here for different rocm versions. i would suggest this: #15273 as we dont make use of any of the functions this header defines anywhere in ggml

slojosic-amd · 2025-08-12T19:13:48Z

@IMbackK I wanted to have "more obvious" backward compatible changes since HIP_ENABLE_WARP_SYNC_BUILTINS macro was introduced in hip.h file earlier and this macro is relevant for ROCm 6.4 and older but not present in ROCm 7.0. Since you deleted define/undef HIP_ENABLE_WARP_SYNC_BUILTINS and just define HIP_DISABLE_WARP_SYNC_BUILTINS which is present only in ROCm 6.5+ your solution is much cleaner and backward compatible 👍

…atibility for ROCm 7.0+ due to requests and comments from ggml-org#15241

slojosic-amd and others added 2 commits August 10, 2025 18:49

Fix HIP warp synchronization mask compatibility for ROCm 7.0+

ba72130

Merge branch 'ggml-org:master' into fix/slojosic-amd/warp_sync_rocm_7

e09f319

slojosic-amd requested a review from JohannesGaessler as a code owner August 11, 2025 13:04

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Aug 11, 2025

Fix CUDA/MUSA build: define GGML_WARP_SYNC_MASK in common.cuh

493f96a

JohannesGaessler reviewed Aug 11, 2025

View reviewed changes

ggml/src/ggml-cuda/common.cuh Outdated Show resolved Hide resolved

ggml/src/ggml-cuda/vendors/hip.h Outdated Show resolved Hide resolved

JohannesGaessler requested a review from IMbackK August 11, 2025 18:43

Addressed code review comments: GGML_WARP_SYNC_MASK renamed to GGML_C…

f264793

…UDA_WARP_MASK

IMbackK suggested changes Aug 11, 2025

View reviewed changes

IMbackK mentioned this pull request Aug 11, 2025

hip : fix warp mask width for rocWMMA compatibility #15239

Closed

slojosic-amd changed the title ~~Fix HIP warp synchronization mask compatibility for ROCm 7.0+~~ Fix HIP warp synchronization function conflicts for ROCm 7.0+ Aug 11, 2025

slojosic-amd added 3 commits August 11, 2025 20:44

Revert all changes introduced in this PR

4d9df1d

Starting from ROCm 6.5 (aka 6.4 with 7.0 preview) HIP_ENABLE_WARP_SYN…

74c7a2f

…C_BUILTINS has been replaced with HIP_DISABLE_WARP_SYNC_BUILTINS (https://github.com/ROCm/clr/blob/rocm-6.4.x-with-7.0-preview/hipamd/include/hip/amd_detail/amd_warp_sync_functions.h#L30)

hipBLAS changes have been introduced also from ROCm 6.5 (aka 6.4 with…

1299c04

… 7.0 preview)

slojosic-amd requested a review from IMbackK August 12, 2025 01:20

lhl mentioned this pull request Aug 12, 2025

For gfx1151, llama.cpp should be built with -DGGML_HIP_ROCWMMA_FATTN=ON for a big performance boost lemonade-sdk/llamacpp-rocm#7

Closed

IMbackK closed this Aug 12, 2025

slojosic-amd added a commit to ROCm/llama.cpp that referenced this pull request Aug 12, 2025

Revert all changes from PR #2: Fix HIP warp synchronization mask comp…

1c62c5a

…atibility for ROCm 7.0+ due to requests and comments from ggml-org#15241

slojosic-amd mentioned this pull request Aug 12, 2025

Revert PR #2 ROCm/llama.cpp#3

Merged

slojosic-amd deleted the fix/slojosic-amd/warp_sync_rocm_7 branch August 12, 2025 20:40

slojosic-amd mentioned this pull request Oct 29, 2025

rocwmma_patch.sh change lemonade-sdk/llamacpp-rocm#25

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix HIP warp synchronization function conflicts for ROCm 7.0+ #15241

Fix HIP warp synchronization function conflicts for ROCm 7.0+ #15241

Uh oh!

slojosic-amd commented Aug 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

IMbackK left a comment

Uh oh!

slojosic-amd commented Aug 12, 2025

Uh oh!

IMbackK commented Aug 12, 2025 •

edited

Loading

Uh oh!

slojosic-amd commented Aug 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix HIP warp synchronization function conflicts for ROCm 7.0+ #15241

Fix HIP warp synchronization function conflicts for ROCm 7.0+ #15241

Uh oh!

Conversation

slojosic-amd commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Uh oh!

Uh oh!

Uh oh!

IMbackK left a comment

Choose a reason for hiding this comment

Uh oh!

slojosic-amd commented Aug 12, 2025

Uh oh!

IMbackK commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

slojosic-amd commented Aug 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

slojosic-amd commented Aug 11, 2025 •

edited

Loading

IMbackK commented Aug 12, 2025 •

edited

Loading